Optimal Policies for Quantum Markov Decision Processes
نویسندگان
چکیده
Abstract Markov decision process (MDP) offers a general framework for modelling sequential making where outcomes are random. In particular, it serves as mathematical reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve model about systems. We develop dynamic programming algorithms policy evaluation and finding optimal policies qMDPs in the case finite-horizon. The results obtained this provide some useful tools learning techniques applied to world.
منابع مشابه
Identification of optimal policies in Markov decision processes
In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the mod...
متن کاملDiscovery of Structured Optimal Policies in Markov Decision Processes
In this chapter we continue work on vfd, the novel method for discovery of relative value functions for Markov Decision Processes that we introduced in Chapter 6. vfd discovers algebraic descriptions of relative value functions using ideas from the Evolutionary Algorithm eld and, in particular, these descriptions include the model parameters of the mdp. We extend that work and demonstrate how a...
متن کاملConstrained Markov Decision Process and Optimal Policies
In the course lectures, we have discussed a lot regarding unconstrained Markov Decision Process (MDP). The dynamic programming decomposition and optimal policies with MDP are also given. However, in this report we are going to discuss a different MDP model, which is constrained MDP. There are many realistic demand of studying constrained MDP. For instance, in the wireless sensors networks, each...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Automation and Computing
سال: 2021
ISSN: ['1751-8520', '1476-8186']
DOI: https://doi.org/10.1007/s11633-021-1278-z